Support Vector Machine Tutorial
Install the packageinstall.packages("e1071",repos="http://cran.rstudio.com/")
## ## The downloaded binary packages are in ## /var/folders/3z/jqczpc_95yq_sbgl2665kg2c0000gq/T//RtmpnSbDBK/downloaded_packages
Check Data Samplelibrary(e1071)
-There’s a data sample about how to classify 3 kinds of Iris (a kind of flower).
-Iris: Setosa Iris, Versicolor Iris, Virginica Iris. (150 rows,50 for each)
-Features: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width#quick check iris
Data Preview## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa . . . ## 148 6.5 3.0 5.2 2.0 virginica ## 149 6.2 3.4 5.4 2.3 virginica ## 150 5.9 3.0 5.1 1.8 virginica
-Using 2D plot to check the relationship between two different features. There should be C42=6 plots. -Here only show one sample about how to make the plot.
#preview i1<-as.numeric(iris$Species) i1[i1=="setosa"]<-1 i1[i1=="versicolor"]<-2 i1[i1=="virginica"]<-3 plot(iris$Sepal.Length,iris$Petal.Width,xlab="the length of Sepal",ylab="the width of petal",main="IRIS",pch=i1,col=i1) legend("bottomright",c("setosa","versicolor","virginica"),pch=c(1,2,3),col=c(1,2,3))
Seperate Data to trainning and test samples
set.seed(1234) all <- sample(2,nrow(iris),replace=TRUE,prob=c(0.7,0.3)) train <- iris[all==1,] test <- iris[all==2,] train
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa . . . ## 145 6.7 3.3 5.7 2.5 virginica ## 146 6.7 3.0 5.2 2.3 virginica ## 148 6.5 3.0 5.2 2.0 virginica ## 150 5.9 3.0 5.1 1.8 virginica
test
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 5 5.0 3.6 1.4 0.2 setosa ## 14 4.3 3.0 1.1 0.1 setosa ## 16 5.7 4.4 1.5 0.4 setosa ## 26 5.0 3.0 1.6 0.2 setosa ## 28 5.2 3.5 1.5 0.2 setosa . . . ## 137 6.3 3.4 5.6 2.4 virginica ## 140 6.9 3.1 5.4 2.1 virginica ## 142 6.9 3.1 5.1 2.3 virginica ## 147 6.3 2.5 5.0 1.9 virginica ## 149 6.2 3.4 5.4 2.3 virginica
6.SVM Classifier
-Using C-classification (C-classification,nu-classification,one-classification (for novelty detection),eps-regression,nu-regression) -Cost: C for slack variable -kernel:radial (linear,polynomial,radial basis,sigmoid)
svm <- svm(train[,1:4],train[,5],type="C-classification",cost=10,kernel='radial') pred<-predict(svm,test[,1:4],decision.values=TRUE) table(pred,test[,5])
## ## pred setosa versicolor virginica ## setosa 10 0 0 ## versicolor 0 12 2 ## virginica 0 0 14
-According to the plot in 3, linear kernal can present better result than radial basis kernal.
svm <- svm(train[,1:4],train[,5],type="C-classification",cost=10,kernel='linear') pred<-predict(svm,test[,1:4],decision.values=TRUE) table(pred,test[,5])
## ## pred setosa versicolor virginica ## setosa 10 0 0 ## versicolor 0 12 0 ## virginica 0 0 16